The
aus_arrivalsdata set comprises quarterly international arrivals (in thousands) to Australia from Japan, New Zealand, UK and the US. Useautoplot(),gg_season()andgg_subseries()to compare the differences between the arrivals from these four countries. Can you identify any unusual observations?
aus_arrivals |> autoplot(Arrivals)
Generally the number of arrivals to Australia is increasing over the entire series, with the exception of Japanese visitors which begin to decline after 1995. The series appear to have a seasonal pattern which varies proportionately to the number of arrivals. Interestingly, the number of visitors from NZ peaks sharply in 1988. The seasonal pattern from Japan appears to change substantially.
aus_arrivals |> gg_season(Arrivals, labels = "both")
The seasonal pattern of arrivals appears to vary between each country. In particular, arrivals from the UK appears to be lowest in Q2 and Q3, and increase substantially for Q4 and Q1. Whereas for NZ visitors, the lowest period of arrivals is in Q1, and highest in Q3. Similar variations can be seen for Japan and US.
aus_arrivals |> gg_subseries(Arrivals)
The subseries plot reveals more interesting features. It is evident that whilst the UK arrivals is increasing, most of this increase is seasonal. More arrivals are coming during Q1 and Q4, whilst the increase in Q2 and Q3 is less extreme. The growth in arrivals from NZ and US appears fairly similar across all quarters. There exists an unusual spike in arrivals from the US in 1992 Q3.
Unusual observations:
Use the following graphics functions:
autoplot(),gg_season(),gg_subseries(),gg_lag(),ACF()and explore features from the following time series: “Total Private”Employedfromus_employment,Bricksfromaus_production,Harefrompelt, “H02”CostfromPBS, andus_gasoline.
- Can you spot any seasonality, cyclicity and trend?
- What do you learn about the series?
- What can you say about the seasonal patterns?
- Can you identify any unusual years?
us_employment |>
filter(Title == "Total Private") |>
autoplot(Employed)
There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global financial crisis.
us_employment |>
filter(Title == "Total Private") |>
gg_season(Employed)
us_employment |>
filter(Title == "Total Private") |>
gg_subseries(Employed)
us_employment |>
filter(Title == "Total Private") |>
gg_lag(Employed)
us_employment |>
filter(Title == "Total Private") |>
ACF(Employed) |>
autoplot()
In all of these plots, the trend is so dominant that it is hard to see anything else. We need to remove the trend so we can explore the other features of the data.
aus_production |>
autoplot(Bricks)
A positive trend in the first 20 years, and a negative trend in the next 25 years. Strong quarterly seasonality, with some cyclicity – note the recessions in the 1970s and 1980s.
aus_production |>
gg_season(Bricks)
Brick production tends to be lowest in the first quarter and peak in either quarter 2 or quarter 3.
aus_production |>
gg_subseries(Bricks)
The decrease in the last 25 years has been weakest in Q1.
aus_production |>
gg_lag(Bricks, geom='point')
aus_production |>
ACF(Bricks) |> autoplot()
The seasonality shows up as peaks at lags 4, 8, 12, 16, 20, …. The trend is seen with the slow decline on the positive side.
pelt |>
autoplot(Hare)
There is some cyclic behaviour with substantial variation in the length of the period.
pelt |>
gg_lag(Hare, geom='point')
pelt |>
ACF(Hare) |> autoplot()
The cyclic period seems to have an average of about 10 (due to the local maximum in ACF at lag 10).
There are four series corresponding to H02 sales, so we will add them together.
h02 <- PBS |>
filter(ATC2 == "H02") |>
group_by(ATC2) |>
summarise(Cost = sum(Cost)) |>
ungroup()
h02 |>
autoplot(Cost)
A positive trend with strong monthly seasonality, dropping suddenly every February.
h02 |>
gg_season(Cost)
h02 |>
gg_subseries(Cost)
The trends have been greater in the higher peaking months – this leads to increasing seasonal variation.
h02 |>
gg_lag(Cost, geom='point', lags=1:16)
h02 |>
ACF(Cost) |> autoplot()
The large January sales show up as a separate cluster of points in the lag plots. The strong seasonality is clear in the ACF plot.
us_gasoline |>
autoplot(Barrels)
A positive trend until 2008, and then the global financial crisis led to a drop in sales until 2012. The shape of the seasonality seems to have changed over time.
us_gasoline |>
gg_season(Barrels)
There is a lot of noise making it hard to see the overall seasonal pattern. However, it seems to drop towards the end of quarter 4.
us_gasoline |>
gg_subseries(Barrels)
The blue lines are helpful in seeing the average seasonal pattern.
us_gasoline |>
gg_lag(Barrels, geom='point', lags=1:16)
us_gasoline |>
ACF(Barrels, lag_max = 150) |> autoplot()
The seasonality is seen if we increase the lags to at least 2 years (approx 104 weeks)
The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.
The
aus_livestockdata contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Usefilter()to extract pig slaughters in Victoria between 1990 and 1995. UseautoplotandACFfor this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?
vic_pigs <- aus_livestock |>
filter(Animal == "Pigs", State == "Victoria", between(year(Month), 1990, 1995))
vic_pigs
## # A tsibble: 72 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1990 Jan Pigs Victoria 76000
## 2 1990 Feb Pigs Victoria 78100
## 3 1990 Mar Pigs Victoria 77600
## 4 1990 Apr Pigs Victoria 84100
## 5 1990 May Pigs Victoria 98000
## 6 1990 Jun Pigs Victoria 89100
## 7 1990 Jul Pigs Victoria 93500
## 8 1990 Aug Pigs Victoria 84700
## 9 1990 Sep Pigs Victoria 74500
## 10 1990 Oct Pigs Victoria 91900
## # ℹ 62 more rows
vic_pigs |>
autoplot(Count)
Although the values appear to vary erratically between months, a general upward trend is evident between 1990 and 1995. In contrast, a white noise plot does not exhibit any trend.
vic_pigs |> ACF(Count) |> autoplot()
The first 14 lags are significant, as the ACF slowly decays. This suggests that the data contains a trend. A white noise ACF plot would not usually contain any significant lags. The large spike at lag 12 suggests there is some seasonality in the data.
aus_livestock |>
filter(Animal == "Pigs", State == "Victoria") |>
ACF(Count) |>
autoplot()
The longer series has much larger autocorrelations, plus clear evidence of seasonality at the seasonal lags of \(12, 24, \dots\).
Use the following code to compute the daily changes in Google closing stock prices.
dgoog <- gafa_stock |> filter(Symbol == "GOOG", year(Date) >= 2018) |> mutate(trading_day = row_number()) |> update_tsibble(index = trading_day, regular = TRUE) |> mutate(diff = difference(Close))Why was it necessary to re-index the tsibble?
Plot these differences and their ACF.
Do the changes in the stock prices look like white noise?
dgoog <- gafa_stock |>
filter(Symbol == "GOOG", year(Date) >= 2018) |>
mutate(trading_day = row_number()) |>
update_tsibble(index = trading_day, regular = TRUE) |>
mutate(diff = difference(Close))
The tsibble needed re-indexing as trading happens irregularly. The new index is based only on trading days.
dgoog |> autoplot(diff)
dgoog |> ACF(diff, lag_max=100) |> autoplot()
There are some small significant autocorrelations out to lag 24, but nothing after that. Given the probability of a false positive is 5%, these look similar to white noise.